Stepwise regression for unsupervised learning

نویسنده

  • Jonathan Landy
چکیده

I consider unsupervised extensions of the fast stepwise linear regression algorithm [5]. These extensions allow one to efficiently identify highly-representative feature variable subsets within a given set of jointly distributed variables. This in turn allows for the efficient dimensional reduction of large data sets via the removal of redundant features. Fast search is effected here through the avoidance of repeat computations across trial fits, allowing for a full representative-importance ranking of a set of feature variables to be carried out in O(nm) time, where n is the number of variables and m is the number of data samples available. This runtime complexity matches that needed to carry out a single regression and is O(n) faster than that of naive implementations. I present pseudocode suitable for efficient forward, reverse, and forward-reverse unsupervised feature selection. To illustrate the algorithm’s application, I apply it to the problem of identifying representative stocks within a given financial market index – a challenge relevant to the design of Exchange Traded Funds (ETFs). I also characterize the growth of numerical error with iteration step in these algorithms, and finally demonstrate and rationalize the observation that the forward and reverse algorithms return exactly inverted feature orderings in the weakly-correlated feature set regime.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Local Filter Selection Boosts Performance of Automatic Speechreading

We examine general purpose unsupervised techniques for visual preprocesing in machine vision tasks. In particular we analyze a wide variety of principal component and independent component techniques in combination with stepwise regression methods for variable selection. The task at hand is recognition of the first four digits spoken in English using hidden Markov models (HMM) for the recogniti...

متن کامل

Building an asynchronous web-based tool for machine learning classification

Various unsupervised and supervised learning methods including support vector machines, classification trees, linear discriminant analysis and nearest neighbor classifiers have been used to classify high-throughput gene expression data. Simpler and more widely accepted statistical tools have not yet been used for this purpose, hence proper comparisons between classification methods have not bee...

متن کامل

Knowledge-based Clustering as a Conceptual and Algorithmic Environment of Biomedical Data Analysis

While a genuine abundance of biomedical data available nowadays becomes a genuine blessing, it also posses a lot of challenges. The two fundamental and commonly occurring directions in data analysis deal with its supervised or unsupervised pursuits. Our conjecture is that in the area of biomedical data processing and understanding where we encounter a genuine diversity of patterns, problem desc...

متن کامل

Neural networks based EEG-Speech Models

In this paper, we describe three neural network (NN) based EEG-Speech (NES) models that map the unspoken EEG signals to the corresponding phonemes. Instead of using conventional feature extraction techniques, the proposed NES models rely on graphic learning to project both EEG and speech signals into deep representation feature spaces. This NN based linear projection helps to realize multimodal...

متن کامل

Information Theoretic Clustering

Clustering is one of the important topics in pattern recognition. Since only the structure of the data dictates the grouping (unsupervised learning), information theory is an obvious criteria to establish the clustering rule. This paper describes a novel valley seeking clustering algorithm using an information theoretic measure to estimate the cost of partitioning the data set. The information ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1706.03265  شماره 

صفحات  -

تاریخ انتشار 2017